# Modeling and simulation of Power Consumption on Heterogenous CPU Cores under varying workloads and operating conditions

Atharv Arun Desai

Department of CSA

Indian Institute of Science (IISc)

Bangalore, India

atharvarun@iisc.ac.in

Boul Chandra Garai

Department of CSA

Indian Institute of Science (IISc)

Bangalore, India

chandraboul@iisc.ac.in

Himanshu Srivastava

Department of CSA

Indian Institute of Science (IISc)

Bangalore, India
himanshusriv@iisc.ac.in

Vaisakh P S
Department of CSA
Indian Institute of Science (IISc)
Bangalore, India
vaisakhp@iisc.ac.in

Abstract—Phase-2 report for E0-240 - Modeling and Simulation course project. The main objective of this project is to apply concepts learned in E0-240 course in to Modeling and simulation of a real-world system, which in this case is Multicore, Heterogenous CPU. This project, will focus on developing a Power Consumption Model for simulated Full-System [1] under varying workloads. This model will be developed taking in consideration various operating conditions of the CPU such as Dynamic Frequency Scaling, Heterogenous Cores [2]

Index Terms—Modeling, simulation, heterogenous CPU cores, power consumption

## I. BACKGROUND

Power consumption is one of the key performance indices of any embedded or mobile device which operate of power budget, as this directly impacts on user experience and usability of any such devices. Hence, the need for accurate power models in simulation environment has increased as well, to enable designer and manufacturers to measure the impact of any new functionality or optimization that is being prototyped. Insights from such models, will allow all key stakeholders in an embedded product development arena for evaluation without waiting hardware fabrication and rollout, there by saving resources and investment.

One of the main motivations for this project is the top-down power modeling approach [3] that utilized Performance Monitoring Counters(PMCs) in an actual hardware along with overall power consumption data to develop an empirical power model in Gem5 simulator [1]. The average error achieved by this approach is claimed to be less than 6%. We will further explore in to additional enhancement over this said approach by factoring in additional CPU performance metrics.

# II. METHODOLOGY

As mentioned in previous work [3], a similar method is followed in developing a Top-down Power model that

will be integrated in to Gem5. ODROID-XU4 [4] Big-Little development board is chosen as first target for experimentation, data gathering and validation efforts. An overview of this hardware is show in Figure. 1. ODROID-XU4 offers minimalistic development platform with a Samsung Exynos-5422 8-core ARM Cortex<sup>TM</sup>-A15 Quad 2GHz and Cortex<sup>TM</sup>-A7 Quad 1.3GHz CPUs, with a 2GB LPDDR3 RAM operating at 933MHz stacked along with CPU package. Both Cortex-A15 and Cortex-A7 cores has 32KB Instruction and Data caches each. For L2 cache, the Cortex-A15 and Cortex-A7 cores makes use of 2MB and 512 KB respectively.



Fig. 1. ODroid XU4 board overview, image source [4]

# A. Modeling and Development Strategy

Simulation of the ODROID-XU4/Exynos-5422 will be integrated in to Gem5, that would closely resemble its CPU operating parameters. A SmartPower3 [5], power monitor unit will be used along ODROID-XU4 as represented in Figure. 2, to measure overall power consumption on the hardware, while most of the peripheral modules on it will be kept switched off to reduce any variation or impact on the measured data. In addition, the perf [6] is used to gather PMC data-points. A summary of data-points being gathered for this modeling exercise is listed in Table I. These data-points are sampled



Fig. 2. Experiment setup for power data gathering from ODROID-XU4 [4] hardware

at an interval of 100ms from both perf and SmartPower2 modules.

TABLE I
POWER AND PERFORMANCE FEATURE GATHERED FROM MENTIONED
EXPERIMENT SETUP

| Statistics | Feature details |                                                                              |  |
|------------|-----------------|------------------------------------------------------------------------------|--|
| Type       | Source          | Details                                                                      |  |
| CPU Cycles | perf [6]        | CPU cycles, bus cycles, instruc-<br>tions, CPU frequency, CPU idle<br>states |  |
| Branches   | perf            | Branch instruction and speculative operation statistics                      |  |
| Caches     | perf            | Cache references, misses at various levels.                                  |  |
| Power      | SmartPower3 [5] | Power drawn from supply.                                                     |  |
| Misc       | perf            | CPU Migrations, Virtual memory etc.                                          |  |

A set of preliminary workloads that would induce resource load for CPU and memory will be executed on the ODROID-XU4 device, while the power consumption and PMC data are simultaneously recorded. Few of the workloads that are being considered as listed in Table II. As of now, a total of 5 workloads have been employed. Furthermore, integration of SPEC2017 will allow inclusion of up to 43 feasible benchmarks to improve quantity of data.

 $\begin{tabular}{l} TABLE~II\\ LIST~OF~WORKLOADS~BEING~USED~FOR~DATA~GATHERING~AND\\ VALIDATION \end{tabular}$ 

| Workload         | Workload details and status of integration |          |  |
|------------------|--------------------------------------------|----------|--|
| Type             | Workloads                                  | Used     |  |
| Stress Test      | stress command [7]                         | <b>√</b> |  |
| Video Encoding   | ffmpeg encode [7]                          | <b>√</b> |  |
| File Compression | gzip, bzip2, xz on complex datasets [8]    | <b>√</b> |  |
| Benchmark Suite  | SPEC2017 CPU Benchmarks [9]                | Planned  |  |

# III. Phase-2 Progress

So far, team has completed, ramping up in to Gem5 simulator and its internals. In terms of actual hardware data gathering, the experiment setup shown in Figure 2 is established and data gathering of integrated workloads mentioned in Table II on ODROID-XU4 hardware is also completed.

Mathematical modeling with the data obtained is in-progress and we are expected to arrive at the empirical mathematical model soon. This model will be integrated in to the simulated Gem5 Exynos5422 instance for validation and further refinement.

#### IV. DISCUSSION

In terms workload execution, these workloads can get executed on any of the available CPU cores in Big and Little clusters, thereby greatly influencing the runtime performance and the power consumption behavior of the same workload. This can be a problem on accuracy of the empirical model being developed. In order avoid the same, we will be restricting execution of workloads to specific CPU core by making use of affinity management primitives available in Linux.

The variations in power consumption seems to be also being influenced by the CPU Dynamic Clock Performance Governors [10] and associated modules in Linux Performance management. We may investigate in to the influence of one or more governors on the power consumption and integrate the same in to the empirical model being developed.

## V. VERIFICATION APPROACH

Reddy et al [3] discusses about accuracy evaluation that need to be done between Gem5's simulation of the CPU and actual hardware in terms of execution time and PMCs count statistics differences. The same methods shall be extended to modelled-power vs actual-power evaluation.

The empirical model validation will include standard error comparisons, along with k-fold cross-validation to evaluate Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE).

A key observation made from the paper is that the CPU frequency levels have been locked to a specific points such as 200 MHz, 600 MHz, 1000 MHz, and 1600 MHz, and the data was compared between simulated and actual hardware. We will explore in to using the available frequency governors and unrestricted minimum and maximum frequency capping, to identify improvement areas and limitations.

#### VI. NEXT STEP

For integrating an empirical power model in to Gem5, the key features will be identified through feature engineering on obtained data-set, along with the coefficients required in to the same. The CPU model available in Gem5, will be extended to closely resemble the specification of Cortex-A15 and Cortex A-7 CPU cores. The verification approaches mentioned in earlier section will be used for validation.

With SPEC2017 workloads, further more data will be gathered covering CPU frequencies of different cores/clusters influencing power consumption, so as to improve accuracy of the power model against ODROID-XU4, as well as identifying limitations of this approach.

## REFERENCES

- [1] A. Akram and L. Sawalha, "A survey of computer architecture simulation techniques and tools," IEEE Access, vol. 7, pp. 78120-78145,
- [2] A. Inc., ""big. little technology: The future of mobile", white paper," 2013. [Online]. Available: https://www.arm.com/
- [3] B. K. Reddy, M. J. Walker, D. Balsamo, S. Diestelhorst, B. M. Al-Hashimi, and G. V. Merrett, "Empirical cpu power modelling and estimation in the gem5 simulator," 2017 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS), pp. 1-8, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:10100084
- [4] "Odroid-xu4: Big.little development board," Apr 2023. [Online]. Available: https://wiki.odroid.com/odroid-xu4/odroid-xu4
- "Smartpower3: Power monitor," May 2023. [Online]. Available:
- https://wiki.odroid.com/accessory/power\_supply\_battery/smartpower3

  J. Kukunas, "Chapter 8 perf," in *Power and Performance*. Boston: Morgan Kaufmann, 2015, pp. 137-165.
- [7] A. L. Wiki, "Stress testing," Aug 2023. [Online]. Available: https://wiki.archlinux.org/title/Stress\_testing
- [8] P. PeaZip project TOS, "Compression benchmark: 7-zip, peazip, winrar, winzip comparison," Aug 2023. [Online]. Available: https://peazip.github.io/peazip-compression-benchmark.html
- [9] J. Bucek, K.-D. Lange, and J. v. Kistowski, "Spec cpu2017: Nextgeneration compute benchmark," in Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, ser. ICPE '18. New York, NY, USA: Association for Computing Machinery, 2018, p. 41–42. [Online]. Available: https://doi.org/10.1145/3185768.3185771
- [10] C. Scordino, L. Abeni, and J. Lelli, "Energy-aware real-time scheduling in the linux kernel," in Proceedings of the 33rd Annual ACM Symposium on Applied Computing, ser. SAC '18. New York, NY, USA: Association for Computing Machinery, 2018, p. 601-608. [Online]. Available: https://doi.org/10.1145/3167132.3167198